PERF: Index.join to maintain cached attributes in more cases #57023

lukemanley · 2024-01-23T01:45:39Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v3.0.0.rst file if fixing a bug or adding a new feature.

import pandas as pd

data = [f"i-{i:05}" for i in range(100_000)]
dtype = "string[pyarrow_numpy]"

idx1 = pd.Index(data, dtype=dtype)
idx2 = pd.Index(data[1:], dtype=dtype)

# the is_unique call at the end is cached in this PR
%timeit idx1.join(idx2, how="outer").is_unique

# 59.1 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  -> main
# 41.9 ms ± 894 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)   -> PR

… into wrap_join_result

pandas/core/frame.py

mroeschke · 2024-01-24T02:52:07Z

Thanks @lukemanley

…dev#57023) * Index.join result name * whatsnew * update test * Index._wrap_join_result to maintain cached attributes if possible * Index._wrap_join_result to maintain cached attributes if possible * whatsnew * allow indexers to be None * gh ref * rename variables for clarity

lukemanley added 9 commits January 18, 2024 19:57

Index.join result name

db1f094

whatsnew

7ed5207

Merge remote-tracking branch 'upstream/main' into index-join-result-name

319d470

update test

5bd6856

Index._wrap_join_result to maintain cached attributes if possible

b654f8f

Index._wrap_join_result to maintain cached attributes if possible

9dc472d

Merge remote-tracking branch 'upstream/main' into wrap_join_result

b4840bc

Merge branch 'wrap_join_result' of https://github.com/lukemanley/pandas…

842d26c

… into wrap_join_result

whatsnew

02f773e

lukemanley added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Jan 23, 2024

lukemanley added this to the 3.0 milestone Jan 23, 2024

lukemanley added 2 commits January 22, 2024 21:14

allow indexers to be None

1dd35e4

gh ref

43fe322

mroeschke reviewed Jan 23, 2024

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

lukemanley added 2 commits January 23, 2024 19:46

Merge remote-tracking branch 'upstream/main' into wrap_join_result

3aa5f0c

rename variables for clarity

a43e4d4

mroeschke approved these changes Jan 24, 2024

View reviewed changes

mroeschke merged commit 622f31c into pandas-dev:main Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Index.join to maintain cached attributes in more cases #57023

PERF: Index.join to maintain cached attributes in more cases #57023

Uh oh!

lukemanley commented Jan 23, 2024

Uh oh!

Uh oh!

mroeschke commented Jan 24, 2024

Uh oh!

Uh oh!

Uh oh!

PERF: Index.join to maintain cached attributes in more cases #57023

PERF: Index.join to maintain cached attributes in more cases #57023

Uh oh!

Conversation

lukemanley commented Jan 23, 2024

Uh oh!

Uh oh!

mroeschke commented Jan 24, 2024

Uh oh!

Uh oh!